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Abstract — Rateless/fountain codes are designed so that all 
input symbols can be recovered from a slightly larger number of 
coded symbols, with high probability using an iterative decoder. 
In this paper we investigate the number of input symbols that 
can be recovered by the same decoder, but when the number of 
coded symbols available is less than the total number of input 
symbols. Of course recovery of all inputs is not possible, and the 
fraction that can be recovered will depend on the output degree 
distribution of the code. 

In this paper we (a) outer bound the fraction of inputs that 
can be recovered for any output degree distribution of the code, 
and (b) design degree distributions which meet/perform close to 
this bound. Our results are of interest for real-time systems using 
rateless codes, and for Raptor-type two-stage designs. 

I. Introduction 

Rateless codes [1,2], or fountain codes, are random linear 
codes designed for communication over erasure channels. 
Given a block of input symbols, a rateless encoder generates a 
potentially infinite sequence of output symbols, each of which 
is generated identically and independently. This process of 
generation is designed so that it is possible to recover all input 
symbols from any sligtly larger set of output symbols, with 
high probability when the total number of intput symbols is 
large enough. This implies that rateless codes are universally 
capacity achieving. Further, this recovery property can be 
achieved via a simple iterative decoder of low complexity. 

The construction of rateless codes makes them appealing for 
myriad applications. In general, rateless codes are seen to per- 
form well for scenarios where the erasure probability/pattern 
is not known, and in multicast/broadcast applications where 
the encoder outputs onto a shared medium and cannot tune its 
transmissions to individual receivers. 

The design of rateless codes has been optimized so that 
the low-complexity decoder can recover all inputs provided it 
starts with slightly more outputs than inputs. In this paper, we 
investigate the intermediate performance, i.e. the case when 
the number of received output symbols is less than the number 
of input symbols. Of course in this case it is not possible to 
fully recover all input symbols. We investigate the fraction of 
input symbols that can be recovered - by the same iterative 
decoder - as a function of the number of received output 
symbols and the randomized method by which the codes are 
generated. 

The motivations for this investigation are two fold: 

• Codes can be designed - as done in [2] - so that it 
is sufficient to decode only a large enough fraction of 
the inputs, instead of all inputs. The implications of our 
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Fig. 1. Depiction of main results. Each figure plots the (asymptotic) fraction 
z of recoverable inputs as a function of the normalized number r of received 
outputs. The bold line in the top figure represents a pointwise outer (i.e. 
upper) bound on z as a function of r, for any output degree distribution used 
to generate the rateless code. This bound is tight for z € [0, §] an d f £ 
[0, 2 log 3], and the corresponding optimal degree distributions are marked. 
Beyond this region optimal distributions are not known. The second figure 
is this unknown region of the first figure expanded, with the outer bound as 
marked. For this region we develop a series of degree distributions P r , one 
for each r. The "inner bound" represents the points (r,z r ) where z r is the 
corresponding z achieved by P r at r. It can be seen that our distributions 
perform close to (but below) the outer bound. 



results for the design of such codes are discussed in 
Section [V] 

• There may be real-world scenarios where users do not 
receive sufficient output symbols, and scenarios where 
it is sufficient to recover a large enough fraction of the 
inputs instead of all inputs. 

As an example, consider a scenario where a data stream is 
to be transmitted to multiple users over a shared medium. The 
stream needs to be decoded in real time, or near-real time with 
a finite amount of buffering. If rateless codes are to be used 
for the transmission, the stream would have to be broken into 
blocks of symbols, and each block would have to be encoded 
seperately. These output symbols would then be transmitted 
over the shared medium. If such a transmission strategy is 
employed, the output symbols from any given block of inputs 
will only be transmitted for a finite amount of time, before the 
encoder moves on to the next block of inputs. 

If the user channel qualities are very disparate, it is possible 
that some user may not receive the requisite number of outputs 
for some input block. It is reasonable to assume that many 
real-time applications, if suitably pre-coded, can be reliably 
played back from a large enough fraction of the inputs. Thus 
it is of interest to understand the intermediate performance - 
how many inputs can be recovered by the decoder from an 
insufficent number of output symbols. 

As it turns out, any rateless code designed to achieve 
capacity will have poor intermediate performance: even if the 
number of outputs is very close to (but not exactly) sufficient, 
the fraction of inputs that can be decoded will be negligible. 
This fragile performance of capacity achieving rateless codes 
motivates us to investigate other rateless codes. In particular, 
each rateless code is associated with a degree distribution, and 
we investigate and bound the performance of all distributions, 
not just the capacity-achieving ones. Figure [TJ describes the 
main results of this paper. 

In the following, we first give a brief background on rateless 
codes in Section HI] and then develop our investigation of 
intermediate performance in Section [HI] The main results are 
derived in Section [IV] followed by a discussion in Section [V] 

II. Background: Rateless Codes 

In this section we briefly describe how rateless codes are 
encoded and decoded. A more thorough explanation and 
treatment can be found in [1]. Throughout this paper, we will 
assume for simplicity that the input and output symbols are 
bita3 

Encoding: Given k input symbols and a probability distri- 
bution P over [k], encoder generates each output as follows: 
(a) it chooses a random degree d according to P, and (b) 
chooses a set of d inputs uniformly at random and XORs 
them to get the output. 

Decoding: Given n output symbols, the decoder selects one 
of degree one. The value of the corresponding input is set, and 

1 In practice each symbol is a packet - a vector of bits - and the operations 
described below are carried out in parallel for each position of the vector. 



that input is then cancelled out of all other outputs it is a part 
of. Decoding stops when there are no degree-ones left. 

The degree distribution is a crucial component of the rateless 
code design above. Clearly, for any code, the decoder needs 
n > k to recover all input bits. LT Codes [1], the first 
rateless codes, were designed with the objective of ensuring 
that all k input bits are recovered with high probability from 
n bits when is not much larger than k. In [1] an ideal soliton 
distribution Ik is identified as being the unique distribution 
that will, in expectation, enable efficient decoding with the 
iterative decoder. This distribution performs poorly due to the 
random fluctuations, and is thus modified into a robust soliton 
distribution P k that works well in practice. In the limit as 
k — > oo, both Ik and -P^ converge pointwise to the following 
distribution 

[0 if i = 1 

m = T-4T **>2 

This / is unique: any sequence of degree distributions that 
achieve capacity with iterative decoding will have / as the 
limiting distribution. 

Recall that the iterative decoder requires degree-one outputs 
to start decoding. While for each finite k both P^ and Ik 
will result in some degree-one parities being generated, the 
above limiting distribution has no degree-one output parities. 
This thus illustrates that given a limiting degree distribution 
with some predicted prformance, it is possible to modify it for 
finite k so as to actually achieve the predicted performance. 

III. Problem Statement: Intermediate 
Performance 

In general, the (distribution of) the number of input bits 
recovered will depend on the number of received parities, the 
degree distribution used for generating the rateless codes, and 
the total number k of input bits. In this paper we will be 
concerned with asymptotic performance, i.e. we for the case 
when k is large. In particular, given a sequence {Pk } of degree 
distributions that converge pointwise to a distribution P, we 
will be concerned only with the performance of P. Towards 
this end we define the following two quantities 

no. of received parity bits 



no. of input bits k 
no. of decoded input bits 



no. of input bits k 

We will be interested in the relation between r and z as k — > 
oo. Towards this end we define z(r, P) to be the limiting value 
of Zk for Tk—r and Pk — > P. □ 

As a simple illustrative example, consider again the soliton 
distribution of [1]. In terms of the notation in this paper, the 
result of [1] implies that z(l,I) — 1. Note that this is the 
same as saying that I is a capacity achieving distribution. 

As pointed out in [4], recent results in random hypergraphs 
by Darling and Norris [3] can be used to study the asymptotic 
performance of the iterative decoder for the limiting degree 

2 [3] shows that these limits are well defined. 



distribution of the rateless code. To do so we will need some 
additional notation. 

First, for any degree distribution P, we define its generat- 
ing function V{t) = J2i>i P(i)t l and its derivative V(t). 
Further, for any real number r > we define the term 

s(r,P) = inf {t e [0,1) : rP'(t)+lag(l-t) < 0} A 1 (2) 

We now restate Theorem 2. 1 of [3] in the notation developed 
in our paper. 

Theorem 1 (Darling and Norris [3]): Let real number r > 
and the limit degree distribution P be such that 

rV'(t) + log(l - 1) > for < t < s(r, P) (3) 

Then, as k — » oo, if the number of received parity bits is 
Poisson(rk) then — > s(r,P). 

The above theorem gives a way to calculate the quantity 
z(r, P) of interest : modulo the poisson approximation, it says 
that for limiting distriutions that satisfy the conditions of the 
theorem we have that z(r, P) — s(r, P). 

Note that for the ideal soliton ditribution / we have that 
s(l, J) = 1. However the above theorem, as stated, does not 
apply to the limiting soliton distribution /. This is because 
T'(0) — and so for any r we will violate the condition 
rJ'(O) + log(l - 0) > 0. In fact, for r = 1 we have that 
T(t) + log(l - t) = for all < t < 1. Thus the above 
theorem cannot be directly used to show that z(l,I) = 1. 

The above problem with the soliton distribution illustrates 
why it may be hard to use the above theorem directly to 
evaluate the performance of otherwise interesting degree dis- 
tributions. As a stepping stone to obtaining interesting results 
using the above ideas, we first define a perturbation as follows: 
given a sequence of degree distributions Pk — > P and a real 
number S > 0, let be the distribution defined by 

Qk(l) = 6 + (l-6)P k (l) 
Q k (i) = (l-d)P k (i) fori>2 

In the above we have moved a small amount of mass from the 
higher degrees of P to degree one. Note that now Qk — > Q 
whose generating function is Q(t) = (1 — S)V(t) + St. We 
now use the above theorem to show that the performance of 
the perturbed Q will be close to that predicted by s(r, P), 
even if the original P does not satisfy the conditions ©. 

Lemma 1: Given e > there exists 8\ > so that for and 
5 < Si the corresponding perturbed Q, as defined above, the 
following holds 

s(r,P) < z(-^—,Q) < s(r,P) + e 
l—o 

Remark: The above lemma says that, with a slightly higher 
value of r, the (5-perturbed distribution has the actual fraction 
z of recovered inputs close to the s(r, P) predicted for P and 
r. 

Proof: Note that jz^Q'(t) = rV'{t) + ^ from which it 
follows that s(j^,Q) > s(r,P). It also follows that, given 
e, there exists a Si such that S < S2 implies that s(y^-y,Q) < 
s(r, P) + e. 



Also, there exists £3 so that for all 6 < S3 we have that 
property (01 is satisfied by y^-y and Q. For any such S we have 
that z(yzt,Q) = s(j-z-f,Q) from Theorem 1. Thus setting 
Si = min{5 2 , £3} proves the lemma. ■ 

In light of the above lemma, for the remaining portion of 
the paper we will be interested studying and bounding s(r, P) 
as the degree distribution P is allowed to vary. 

IV. Results 

The quantity s(r, P) exists for all P, and hence we can 
easily find out the (approximate, asymptotic) relation between 
the number of decoded inputs and the number of received 
parities for any particular rateless code degree distribution 
P. In this section we take this as the starting point in our 
investigation of the intermediate performance of rateless codes. 
Lemma [1] in the last section justifies this approach. 

As an illustrative first step, we investigate the performance 
of the limiting soliton distribution / specified by (Q}. The 
corresponding generating function is I'{t) — — log(l — t). It 
is easy to see that there is a discontinuity at r = 1: s(l,I) = 1 
but s(r,I) = for every < r < 1. This means that while 
the soliton achieves capacity, its performance is fragile in the 
sense that even if the number of received parities is slightly 
less than required the fration of recovered inputs will be very 
small. 

We are interested in distirbutions with optimal intermediate 
performance. Towards this end we define the following terms: 
for each < r < 1 let P^ be s.t. s(r, Pr r \) > s(r, P) for all 
P. We will be interested in finding, for each r, the distribution 
P/ r ) and corresponding value s(r, Pm). For the purpose of 
analysis, it is convenient to define for each < z < 1 the 
equivalent terms 

r(z,P) = inf{r : s(r, P) > z} 

P (z) s.t. r(z,P (z) ) <r(z,P) for all P 

P( z \ is the optimal distribution for z: it will enable the 
decoding of a fraction z of the inputs from the smallest number 
of received output symbols. Characterizing/bounding P/ z \ and 
corresponding value r{z,Pr z \) for each z is equivalent to a 
characterization in terms of r, and we will find this more 
convenient. 

The above objective is achieved exactly for z G [0, |]. For 
the remaining values we do not know an exact characterization, 
so we do two things for each z G (| , 1): 

1) Use linear programming duality to lower (i.e. outer) 
bound the r(z, P( z ))- 

2) Design distirbutions P/ z \ that perform close to this outer 
bound. 

Note that a lower bound of r(z, P( z )) represents an outer 
bound, i.e. it is a quantity that may not be achievable by any 
rateless code degree distribution. 

In the task of finding P/ z \ the following lemma provides an 
important simplification. 

Lemma 2: Given z < 1, if integer m > 1 is such that 
z < then it has to be that P^ (i) = for alii > m + 1. 



Alternatively, if to is such that z = ^-j- then there exists an 
optimal P( 2 ) with Pt z ^(m + 1) = 0. 
Proof: 

Recall that P'(*) = £\>i P(i) it*" 1 . Now, if t < ^ then 
for every n > m + 1 we have that (n — l)t n ~ 2 > nt™^ 1 . Thus, 
in particular, we have that mt rn ^ 1 > nV 1 " 1 . 

Now, suppose Pi z \ is such that ^2 i>m P( z ){i) > 0. Then 
cosntruct a new Pt z ) as follows: 

P{*)(i) = P(z)(i) forl<*<m-l 

Then, it follows that the corresponding generating functions 
will satisfy V[ x) {t) > V[ g) {t) for all < t < ^ and 
V' (z) {t)>V{ z) (t)fort = ^ l . 

For z < this means that r(z,P^) < r(z,P^), 

contradicing the choice of Pr z -) . Thus for such a z it has to be 
fr*£ i>T „P w (i) = 0. 

For 2 = the above means that r(z, P( z )) > r i z : P(z)) 
and since Pi z \ is optimal this will be equality. This means 
that the alternate distribution P/ z \ is also optimal for z. The 
lemma is thus proved. ■ 

As an immediate corollary of the above lemma we have that 
for all < z < | the optimal distribution is Pt z \(l) — 1 and 
the corresponding r{z,Pi z \) = — log(l — z) for < z < h. 
In terms of r, this means that P( r )(l) = 1 and z(r,Pt r )) 
1 , ■ for log 20 

Moving on to higher values of z and r, Lemma [2] means 
that for any z < 1 if to is such that m=± < z < — ^-r 

J m — — m+1 

then without loss of generality we can restrict attention to 
degree distributions that have support on [to]. However, it 
does not immediately provide an exact answer for what the 
optimal distributions P/ r ) or P( z ) are. So we now use linear 
programming duality to provide a bound on r(z, P( z )). 

Given a fixed z < 1, the optimal r(z, P( z )) is the solution 
of the following optimization problem: 

min r s.t. 

r,P 

rV'it) + log(l - t) > for < t < z 

The problem is stated above is not linear since the constraints 
are not linear. However, it can be easily converted into a linear 
program by a change of variables. Specifically, let sequence 
a be defined by a(i) = rP(i) and its generating function be 
At) = £<<*(*)**• Clearly r = £V a(i) and rV{t) = A{t). 
If integer to is such that - < z < -^hr then the above 

° m — — m+1 

problem can then be rewritten so that r(z, Pi z )) is the solution 
of 

min a(l) + . . . + a(m) s.t. 

a 

A'{t) + log(l - t) > for < t < z (4) 

3 Thus means that if the objective of communication is only the recovery 
of at least half of the inputs, then it is optimal to not employ any coding at 
all ! 



This optimization problem is now linear, but with infinite 
dimensional constraints. Nevertheless, it falls within the stan- 
dard theory of such linear programs. Clearly, the problem is 
feasible. Also, we can write down the dual problem 

min E[- log(l - X)] s.t. 

/(») 

X e [0, z] and PLY 1 " 1 ] < - for 1 < i < m (5) 

i 

where the optimization has to be carried out over all distribu- 
tions f(x) of the random variable X that have support only 
in [0,2;]. 

The dual linear programs above will have no duality gap. 
Thus, we can prove optimality of a particular candidate distri- 
bution by evaluating the corresponding value of the primal 
objective function, and the constructing a dual solution of 
equal value. This allows us to calculate the optimal distribution 
Pf z -\ and corresponding value r(z, Pr z )) for a further range of 
z's. 

Lemma 3: For z e [|)|]> me optimal distribution is 
given by Pr z \(2) = 1, and the corresponding r(z ) P^) = 

-log(l-z) 
2z 

Proof: 

Consider the solution a(2) = ~ 1 °^ 1 ~ z ' > and a(i) = for all 
i 7^ 2 for the primal. This satisfies the constraint: ^4'(0) = 
and 

/-log(l-z) log(l-f)\ 
^) + log(l-t) = 2t( % ; +^V^j 

> for < t < z 

where the last inequality follows from the fact that ~ ioS 2t~ tS> 
is a strictly increasing function of t. Thus the above solution is 
valid for the primal problem and yields a value of ~ lo 2 1 • 
Consider the dual solution 

which puts mass ^- on the point z and the remaining mass 
at the origin. This satisfies the constraints because E[X] = 
z(j^) = \. Thus it is valid solution, and yields a value of 

~ l0f 2z~ Z ^ ^ 01 tne dual. This is seen to be equal to the value 
of our guess for the primal, and thus by weak duality we have 
that both are optimal and that r(z, P( z )) — — lo |^ 1 ~ z ' > . ■ 
Restated in terms of r, the above lemma says that for 
log 2 < r < I log 3 we have that the optimal distribution 
is P (r) (2) = 1. 

Unfortunately, we have not been able to extend the above 
ideas of linear programming duality to exactly characerize P( z ) 
or r(z, P( z )) for values of z > I. Nevertheless, we can use 
linear programming to provide outer bounds. 

Specifically, note that the value of any feasible solution of 
the dual problem (|5]l will be a lower bound of the value of 
the primal problem, and thus of r(z, P( z )). The dual problem 
can be approximately solved numerically by assuming that 
the optimal f(x) has support only on a uniform grid of 
fine granularity over t S [0,2;]. Under this assumption the 



dual problem becomes a standard linear program with finite 
numbers of variables and constraints. Its value will be a true 
lower bound on r(z,P/ z \). This outer bound with a grid 
spacing of 0.001 is shown in Figure [TJ 

The outer bounds as computed above serve as a benchmark 
to evaluate the merit of any particular candidate distribution P 
in the range z £ (§, 1) where the corresponding exact P( z ) is 
not known. In the following we develop, for each z G (|, 1), 
a corresponding Pu\- The performance of these distributions 
is plotted in Figure [TJ and we see that it is close to that of the 
outer bound. 

The intution behind the design is as follows. Recall that 
with the optimal P( z ) for z E [0, |], the constraint of the 
primal problem is tight only at points t = and t = z. Q 
However, for z € (§,1) the constraints may be tight at other 
points besides and Nevertheless, we can design the P( 2 ) 
so that the constraints of the primal (0]l are tight only at t = 
and t = z. We do so in the following lemma. 

Lemma 4: Given a fixed z 6 (| , 1), let integer m be such 



So, aVi z) (t) + log(l - 1) > if and only if t > and 



that 



< z < ^Jij, and let P( z ) be defined by 



1 



ai(i — 1) 
m — 2 



for 2 < i < m — 1 



a(m — 1) 
for all other i 



where 



yVCl— 1 ^ 

%>m 



(6) 



This P( 2 ) represents a probability distribution, and we have 
that 

a P(z)(t) + log(l - 1) > for < t < z 

So the asymptotic fraction of decoded inputs will satisfy 
s(a,P( z )) = z. This is the same as saying r(z, P^) = a. 
Proof: 

We first verify that the P^ is a probability distribution. 
Note that P( z ){i) > for all a > and further that 



m— 1 ^ 

«E^)« = E-(— i) 

i i=2 v 7 



m - 2 
m — 1 



and thus J^i P(z)(i) — 1- So it is a probability distribution for 
any a > We now prove the second property. It is easy 

to verify that 



aV( z ){t) + log(l - t) = amt" 1 - 1 - (m - l)t v 



4 In terms of the dual (5)> this means that the optimal f(x) will have support 
only on the points x = and x = z. 

5 Indeed, numerical simulations seem to indicate that there are always a 
finite number of support points, but that this number increases as z — > 1. 
This is consistent with the fact that for z = 1 the optimal soliton distribution 
/ will result in the optimal dual having support on the entire [0, 1] interval. 



a > 



in — 1 



m 



1 ^ f 

mt™- 1 ^ i 



Note that in the above expression the RHS is a strictly 
increasing function of t, and that the value of a - (given in (O 
- is exactly the RHS of the above expression at t = z. Hence 
the above expression is true, proving the lemma. The fact that 
s(a,P( z )) = z follows from the above and the definition (f2]l. 
■ 

In light of the above lemma, we can now see for each z how 
the corresponding r(z, P^ ) = a as defined by © compares to 
the numerically computed outer bound for that z obtained by 
discretizing the dual. This comparison is made in the second 
subfigure of Figure [TJ 

V. Discussion 

Note that the approximate distributions P( z )) presented in 
Lemma|4]are pretty close to the ideal soliton /. In particular, as 
z — ► 1, P{ z )) — > It can thus be thought of as an approrpiate 
truncation and rescaling of the ideal soliton /. 

A different truncation and rescaling of / is given in the 
paper on Raptor codes [2, Sec. 6]: for e > they define 
D = [4(1 + e)/e] and \i = e/2 + (e/2) 2 and a probability 
distribution whose generating function is 



n D (t) 



if D t i 

—,Y + ^ 2 —> 



t D+1 

D 



It is shown that this distribution can recover z = 1 — 5 from 
r = 1 + e, and thus can get withing e of capacity. Our results 
suggest that using the distribution P(i-s) as developed in [4] 
will give better performance, because it will enable recovery 
of z = 1 — S with r < 1 as opposed to r = 1 + e as is the 
case in [2]. 

While the above distinction is small for small e and S, it 
will be significant for moderate values. This may be of interest 
if the pre-code in [2] is of small length and not too close to 
capacity achieving, and thus requires a larger S. 

Solving for the exact optimal distribution in the z € (|, 1) 
region remains of interest. 
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